World Wide Web Information Retrieval Using Web Connectivity Information
نویسندگان
چکیده
ii PROJECT ABSTRACT Gathering, processing and distributing information from the World Wide Web will be a vital technology for the next century. Web search techniques have played a critical role in the development of information systems. Due to the diverse nature of web documents, traditional search techniques must be improved. Hyperlink structure based methods have proved to be powerful ways of exploring the relationships between web documents. In this project, a prototype web search engine was developed to exploit the link structure of web documents, based on the use of the Companion algorithm. The prototype consists of a web spider, local database, and search software. The system was written using the Java programming language. Our spider crawls and downloads web pages using Lynx, then saves the hyperlinks into an Oracle database. JDBC is used to implement the database processing. Search software makes a vicinity graph for the query URL and returns the most related pages after calculating the hub and authority weights. Finally, HTML web pages provide user interfaces and communicate with CGI using the Perl language. iii ACKNOWLEDGMENTS
منابع مشابه
Assessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools
Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...
متن کاملBehavioral Considerations in Developing Web Information Systems: User-centered Design Agenda
The current paper explores designing a web information retrieval system regarding the searching behavior of users in real and everyday life. Designing an information system that is closely linked to human behavior is equally important for providers and the end users. From an Information Science point of view, four approaches in designing information retrieval systems were identified as system-...
متن کاملMultilingual Information Retrieval in World Wide Web
The article addresses: (1). The design of an information retrieval (IR), as the Multilingual Information Retrieval Tool Hierarchy (MIRTH), which with virtual corpora on the World Wide Web, also known as Web or WWW. It is motivated by the desire to create a search engine to retrieve information by accessing a virtual. (2). The implementation of a general model of multilingual retrieval for the W...
متن کاملA Comparison of Techniques to Find Mirrored Hosts on the WWW
We compare several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information easily available from web proxies and crawlers. Identification of mirrored hosts can improve web-based information retrieval in several ways: First, by identifying mirrored hosts, search engines can avoid storing and ret...
متن کاملRetrieval of Web Documents Using a Fuzzy Hierarchical Clustering
The World Wide Web has huge amount of information that is retrieved using information retrieval tool like Search Engine. Page repository of Search Engine contains the web documents downloaded by the crawler. This repository contains variety of web documents from different domains. In this paper, a technique called “Retrieval of Web documents using a fuzzy hierarchical clustering” is being propo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001